Assignment 0

DATA 311 - Machine Learning

Author

Maxim Storozhuk (40965642)

Introduction

My name is Maxim Storozhuk, and I am a 3rd year student who is majoring in Computer Science.

I play on the volleyball team here at UBCO, and I am looking to learn a lot about Machine Learning this semester! The best ice cream flavours are:

  • Birthday Cake
  • Cookies & Cream
  • Cookie Dough

A picture of a cat should be depicted here Figure 1: My cat Timbit sitting on my homework.

data <- read.csv("assignmentData/sales_data.csv") #data will be stored in the data variable

library(dplyr) #now we can use commands from tidyverse

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
str(data)
'data.frame':   30 obs. of  12 variables:
 $ Date           : chr  "2023-01-01" "2023-01-02" "2023-01-03" "2023-01-04" ...
 $ Order.ID       : chr  "ORD1001" "ORD1002" "ORD1003" "ORD1004" ...
 $ Product.Name   : chr  "Smartphone" "Laptop" "Headphones" "Tablet" ...
 $ Category       : chr  "Mobile" "Computers" "Accessories" "Mobile" ...
 $ Price          : num  300 900 50 200 300 ...
 $ Quantity.Sold  : int  1 2 3 1 2 1 1 2 1 2 ...
 $ Total.Sales    : num  300 1800 150 200 600 ...
 $ Customer.ID    : chr  "CUST500" "CUST501" "CUST502" "CUST503" ...
 $ Customer.Age   : int  34 29 42 38 25 31 27 40 35 33 ...
 $ Customer.Gender: chr  "Female" "Male" "Non-binary" "Female" ...
 $ Payment.Method : chr  "Credit Card" "Cash" "PayPal" "Credit Card" ...
 $ Store.Location : chr  "New York" "Los Angeles" "Chicago" "New York" ...
head(data)
data <- distinct(data) #removes duplicate rows

summary(data) #gives a summary of data, including displaying how many missing values there are
     Date             Order.ID         Product.Name         Category        
 Length:30          Length:30          Length:30          Length:30         
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
     Price         Quantity.Sold    Total.Sales      Customer.ID       
 Min.   :  39.99   Min.   :1.000   Min.   :  89.99   Length:30         
 1st Qu.: 142.49   1st Qu.:1.000   1st Qu.: 199.99   Class :character  
 Median : 249.99   Median :2.000   Median : 334.99   Mode  :character  
 Mean   : 337.32   Mean   :1.833   Mean   : 527.32                     
 3rd Qu.: 349.99   3rd Qu.:2.000   3rd Qu.: 689.98                     
 Max.   : 999.99   Max.   :5.000   Max.   :2399.97                     
  Customer.Age   Customer.Gender    Payment.Method     Store.Location    
 Min.   :22.00   Length:30          Length:30          Length:30         
 1st Qu.:29.25   Class :character   Class :character   Class :character  
 Median :34.00   Mode  :character   Mode  :character   Mode  :character  
 Mean   :34.60                                                           
 3rd Qu.:39.75                                                           
 Max.   :48.00                                                           
filtered_data <- filter(data, Store.Location == "New York") #filtered_data only contains sales made in New York

sorted_filtered <- arrange(filtered_data, desc(Total.Sales)) #sorted_filtered is now sorted in descending order by total sales, so the highest total sale is at index 1
print(paste("The highest total sale in New York was recorded on", sorted_filtered$Date[1])) #r indexes from 1 rather than 0
[1] "The highest total sale in New York was recorded on 2023-01-27"
freq_table <- table(data$Payment.Method) # will have amount of times each payment method is used correspond with the payment method
most_used <- names(which.max(freq_table)) #most used payment method is stored in the most_used variable
print(paste("The most used payment method is", most_used)) #prints most used payment method
[1] "The most used payment method is Credit Card"
hist(data$Customer.Age, #creates a histogram of customer age
     main = "Customer Age Histogram", #sets histogram title to customer age histogram
     xlab = "Customer Age" #sets x-axis label to customer age
     )

plot(data$Quantity.Sold, data$Price, #creates a scatterplot of quantity vs price
     xlab = "Quantity Sold", #labels the x-axis
     ylab = "Price", #labels the y-axis
     pch = 16, #I don't like the default dots
     main = "Relationship Between Quantity and Price") #sets a title for the plot
Figure 1: Relationship Between Quantity and Price

We can see in Figure Figure 1 that items that sell better tend to cost less. As quantity sold increases, price decreases.